专利摘要:
It is a method and device for encoding data representative of a 3D scene in a container and a corresponding method and device for decoding the encoded data.
公开号:BR112020007727A2
申请号:R112020007727-5
申请日:2018-10-03
公开日:2020-10-13
发明作者:Bertrand Chupeau;Gerard Briand;Mary-Luc Champel
申请人:Interdigital Vc Holdings, Inc.;
IPC主号:
专利说明:

[001] [001] The present disclosure refers to the domain of volumetric video content. The present disclosure also refers to the context of the encoding and / or formatting of the data representative of the volumetric content, for example for rendering on end-user devices, such as mobile devices or Head Mounted Displays.
[002] [002] This section is intended to introduce the reader to various aspects of the technique, which may be related to various aspects of the present disclosure which are described and / or claimed below. This discussion is believed to be useful to provide the reader with basic information to facilitate a better understanding of the various aspects of the present invention. Consequently, it must be understood that these statements should be read in this light, and not as admissions to the prior art.
[003] [003] Recently, there has been a growth in large field of view content available (up to 360 °). Such content is potentially not completely visible to a user who watches the content on immersive display devices, such as Head Mounted Displays, smart glasses, PC screens, tablet computers, smart phones and the like. This means that at any given time, a user may be viewing only a portion of the content.
[004] [004] Immersive video, also called 360 ° flat video, allows the user to watch everything around him by rotating his head around a fixed point of view. Rotations allow only a 3 Degree experience of
[005] [005] A large field of view content can be, among others, a scene of three-dimensional computer graphics (CGI 3D scene), a point cloud or an immersive video. Many terms can be used to project such immersive videos: Virtual reality (VR), 360, panoramic, 4rr sterradian, immersive, omnidirectional or large field of view, for example.
[006] [006] Volumetric video (also known as 6 Degrees of Freedom (6DoF) video) is an alternative to 3DoF video. When watching a 6DoF video, in addition to rotations, the user can also move his head, and even his body, within the watched content and experience parallax and even volumes. Such videos considerably increase the sense of immersion and the perception of the depth of the scene and also avoid dizziness by providing consistent visual feedback during head translations. The content is created by means of dedicated sensors that allow the simultaneous recording of color and depth of the scene of interest. The use of color camera equipment combined with photogrammetry techniques is a common way of making such a recording.
[007] [007] Although 3DoF videos comprise a sequence of images that result from the demapping of texture images (for example, spherical images encoded according to the latitude / longitude projection mapping or equirectangular projection mapping), 6DoF video frames incorporate information from various points of view. They can be viewed as a time series of point clouds that result from a three-dimensional capture.
[008] [008] 3DoF videos can be encoded in a stream as a sequence of rectangular color images generated according to a chosen projection mapping (for example, cubic projection mapping, pyramidal projection mapping or equirectangular projection mapping). This encoding has the advantage of using standard image and video processing standards.
[009] [009] References in the specification to "one (1) modality", "a modality", "an exemplary modality", "a particular modality" indicate that the described modality may include a particular resource, structure or characteristic, but each modality may not necessarily include the resource,
[010] [010] The present disclosure refers to a method of decoding data representative of a 3D scene in a container, the method comprising:
[011] [011] - encode, in a first video track of the container, first data representative of the texture of the visible 3D scene according to a first point of view,
[012] [012] - encode, in at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[013] [013] - encode, in a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[014] [014] - encode metadata in a fourth track of the container, the metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third video track, being that metadata comprises information representative of at least one projection used to obtain the second and third data.
[015] [015] The present disclosure refers to a device configured to encode data representative of a 3D scene in a container, the device comprising a memory associated with at least one processor configured for:
[016] [016] - encode, in a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view,
[017] [017] - encode, in at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[018] [018] - encode, in a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[019] [019] - encode metadata in a fourth track of the container, the metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third video track, being that metadata comprises information representative of at least one projection used to obtain the second and third data.
[020] [020] The present disclosure refers to a device configured to encode data representative of a 3D scene in a container, the device comprising:
[021] [021] - an encoder configured to encode, in a first video track of the container, the first data representative of the texture of the visible 3D scene, according to a first point of view;
[022] [022] - an encoder configured to encode, in at least the second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[023] [023] - an encoder configured to encode, in a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[024] [024] - an encoder configured to encode metadata in a fourth track of the container, the metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third track video, and the metadata comprises information representative of at least one projection used to obtain the second and third data.
[025] [025] The present disclosure refers to a device configured to encode data representative of a 3D scene in a container, the device comprising:
[026] [026] - means to encode, in a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view,
[027] [027] - means to encode, in at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[028] [028] - means to encode, in a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[029] [029] - means to encode metadata in a fourth track of the container, the metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third video track , and the metadata comprises information representative of at least one projection used to obtain the second and third data.
[030] [030] The present disclosure refers to a method of decoding data representative of a 3D scene from a container, the method comprising:
[031] [031] - decode, from a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view;
[032] [032] - decode, from at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[033] [033] - decode, from a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[034] [034] - decode metadata from a fourth track of the container, and said metadata is associated with the first data of the first video track, the second data of at least a second video track and the third data of the third track of video, and the metadata comprises information representative of at least one projection used to obtain the second and third data.
[035] [035] The present disclosure refers to a device configured to decode data representative of a 3D scene from a container, the device comprising a memory associated with at least one processor configured for:
[036] [036] - decode, from a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view;
[037] [037] - decode, from at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[038] [038] - decode, from a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[039] [039] - decode metadata from a fourth track of the container, the said metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third track of video, and the metadata comprises information representative of at least one projection used to obtain the second and third data.
[040] [040] The present disclosure refers to a device configured to decode data representative of a 3D scene from a container, the device comprising:
[041] [041] - a decoder configured to decode, from a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view;
[042] [042] - a decoder configured to decode, from at least a second video track of the container, second data representative of the geometry of the visible 3D scene according to the set of points of view comprising the first point of view;
[043] [043] - a decoder configured to decode, from a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view, and
[044] [044] - a decoder configured to decode metadata from a fourth track of the container, the said metadata being associated with the first data of the first video track, the second data of at least a second video track and
[045] [045] - third data from the third video track, and said metadata comprises information representative of at least one projection used to obtain the second and third data.
[046] [046] The present disclosure refers to a device configured to decode data representative of a 3D scene from a container, the device comprising:
[047] [047] - means to decode, from a first video track of the container, the first data representative of the texture of the visible 3D scene according to a first point of view,
[048] [048] - means to decode, from at least a second video track of the container, second data representative of the geometry of the visible 3D scene, according to the set of points of view comprising the first point of view;
[049] [049] - means for decoding, from a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and
[050] [050] - means for decoding metadata from a fourth track of the container, said metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third video track, the metadata comprising information representative of at least one projection used to obtain the second and third data.
[051] [051] According to a particular feature, the first video track refers to a first bit stream syntax element, at least a second video track refers to at least a second bit stream syntax element and the third video track refers to a third bit stream syntax element.
[052] [052] According to a specific characteristic, the second data comprises a first information representative of a projection format used to obtain the geometry, projection parameters and a flag that indicates whether at least some of the projection parameters are dynamically updated.
[053] [053] According to another characteristic, the third data comprises a second information representative of a projection format used to obtain the texture, projection parameters and a flag that indicates whether at least some of the projection parameters are dynamically updated.
[054] [054] According to an additional feature, the first video track and at least a second video track are grouped in the same group of tracks when the first information and the second information are identical.
[055] [055] According to a particular characteristic, metadata comprises at least one of the following information:
[056] [056] - information representative of at least one point of view associated with at least one projection used to obtain the geometry and texture;
[057] [057] - information representative of a rectangular 2D geometry patch package, with each geometry patch associated with the projection of a part of the 3D scene;
[058] [058] - information representative of a rectangular 2D texture patch package, each texture patch being associated with the projection of a part of the 3D scene;
[059] [059] - information representative of several 3D patches, each 3D patch being associated with a part of the 3D scene and associated with an identifier in the second and first video tracks or in the third video track.
[060] [060] The present disclosure also refers to a bit stream that carries data representative of a 3D scene, the data comprising, in a first video track of a container, the first data representative of the texture of the 3D scene visible according to with a first point of view; in at least a second video track of the container, second data representative of the geometry of the visible 3D scene according to the set of points of view that comprise the first point of view; in a third video track of the container, third data representative of the texture of the 3D scene visible only from the points of view of the set that excludes the first point of view; and metadata in a fourth track of the container, the metadata being associated with the first data of the first video track, the second data of at least a second video track and the third data of the third video track, with the metadata comprising information representative of at least one projection used to obtain the second and third data.
[061] [061] The present disclosure also relates to a computer program product that comprises program code instructions for performing the steps of the method of encoding or decoding data representative of a 3D scene, when that program is run on a computer.
[062] [062] The present disclosure also refers to a processor-readable (non-transitory) medium that has stored in it instructions to make a processor perform at least the encoding or decoding method mentioned above for the representative data of a 3D scene.
[063] [063] The present disclosure will be better understood, and other specific features and advantages will emerge after reading the following description, the description referring to the attached drawings, in which:
[064] [064] - Figure 1 shows a three-dimensional (3D) model of an object and the points of a point cloud that correspond to the 3D model, according to a non-restrictive modality of the present principles;
[065] [065] - Figure 2 shows an image that represents the three-dimensional scene that comprises a surface representation of various objects, according to a non-restrictive modality of the present principles;
[066] [066] - Figure 3 illustrates an exemplary disposition of points of view in the scene of Figure 2 and points visible from that scene from different points of view of that disposition, according to a non-restrictive modality of the present principles;
[067] [067] - Figure 4 illustrates the parallax experience by showing different views of the scene in Figure 2, according to the point of view of Figure 3, according to a non-restrictive modality of the present principles;
[068] [068] - Figure 5 shows a texture image of the points of the scene in Figure 2 visible from the point of view of Figure 3, according to an equirectangular projection mapping, according to a non-restrictive modality of the present principles;
[069] [069] - Figure 6 shows an image of the same points of the scene as in Figure 5, coded according to a cubic projection mapping, according to a non-restrictive modality of the present principles;
[070] [070] - Figure 7 shows a depth image (also called a depth map) of the 3D scene in Figure 2, according to the point of view of Figure 3, according to a non-restrictive modality of the present principles;
[071] [071] - Figures 8A and 8B illustrate a part of a depth patch atlas for points in the scene projected on the texture map of Figure 5, according to a non-restrictive modality of the present principles;
[072] [072] - Figure 9 shows an encoding of residual points as patches after encoding the image of Figure 5 or Figure 6, according to a non-restrictive modality of the present principles;
[073] [073] - Figure 10 illustrates an example of encoding, transmitting and decoding a sequence of the 3D scene in a format that is, at the same time, compatible with 3DoF rendering and compatible with 3DoF + rendering, according to a non-restrictive modality of present principles;
[074] [074] - Figure 11 shows a process of obtaining, encoding and / or formatting data representative of the 3D scene in Figure 2, according to a non-restrictive modality of the present principles;
[075] [075] - Figure 12 shows a process of decoding and rendering the 3D scene of Figure 2, according to a non-restrictive modality of the present principles;
[076] [076] - Figure 13 shows an example of a container that comprises information representative of the 3D scene in Figure 2, according to a non-restrictive modality of the present principles;
[077] [077] - Figure 14 shows an example of the syntax of a bit stream that carries the information and data representative of the 3D scene in Figure 2, according to a non-restrictive modality of the present principles;
[078] [078] - Figure 15 shows an exemplary architecture of a device that can be configured to implement a method described in relation to Figures 11, 12, 16 and / or 17, according to a non-restrictive modality of the present principles;
[079] [079] - Figure 16 illustrates a method of encoding data representative of the 3D scene of Figure 2, implemented, for example, in the device of Figure 15, according to a non-restrictive modality of the present principles;
[080] [080] - Figure 17 illustrates a method for decoding data representative of the 3D scene in Figure 2, implemented, for example, in the device in Figure 15, according to a non-restrictive modality of the present principles.
[081] [081] The subject is now described with reference to the drawings, in which similar numerical references are used to refer to similar elements throughout the document. In the following description, for the purpose of explanation, numerous specific details are presented in order to provide a complete understanding of the subject. However, it may be evident that the modalities of the subject can be practiced without these specific details.
[082] [082] The present description illustrates the principles of the present revelation. In this way, it will be seen that those skilled in the art will be able to conceive various arrangements that, although not explicitly described or shown in this document, incorporate the principles of disclosure.
[083] [083] In accordance with non-limiting modalities of the present disclosure, methods and devices for encoding images from a volumetric video (also called 3DoF + or 6DoF video) in a container and / or in a bit stream are disclosed. Methods and devices for decoding images from a volumetric video from a stream are also revealed. Examples of the syntax of a bit stream for encoding one or more images in a volumetric video are also revealed.
[084] [084] According to a first aspect, the present principles will be described with reference to a first particular modality of a method of (and a device configured for) encoding data representative of a 3D scene (represented with an omnidirectional content, also called immersive video) in a container and / or a bit stream. To achieve this goal, the first representative data of the texture (for example, color information associated with the elements, for example, points, of the 3D scene) of the visible 3D scene, according to a first point of view, are encoded in a first container video track. The second data representative of the geometry of the visible 3D scene, according to a set or range of viewpoints, is encoded in a second video track of the container, the set of viewpoints comprising the first viewpoint (for example , is centered on the first point of view). The third data representing the texture of the 3D scene is also encoded in a third video track of the container. The third data corresponds, for example, to the texture information associated with the parts of the 3D scene that are visible from the points of view of the set of points of view, excluding the part of the scene that is visible according to the first point of view. to avoid encoding the same information twice (that is, once on the first video track and once on the third video track). Metadata is encoded in a fourth range of the container, and the metadata comprises information (for example, parameters) representative of one or more projections used to obtain the second data and the third data.
[085] [085] A corresponding method of (and a device configured for) decoding data representative of the 3D scene is also described in relation to the first aspect of the present principles.
[086] [086] Figure 1 shows a three-dimensional (3D) model of an object 10 and points from a point cloud 11 corresponding to 3D model 10. Model 10 can be a 3D mesh representation and points from point cloud 11 can be the vertices of the mesh. Points 11 can also be points scattered on the face surfaces of the mesh. Model 10 can also be represented as a splatted version of the cloud point 11; where the surface of model 10 is created by splatting the point of point of cloud 11. Model 10 can be represented by a lot of different representations, such as voxels or splines.
[087] [087] A point cloud can be seen as a vector-based structure, where each point has its coordinates (for example, three-dimensional XYZ coordinates, or a depth / distance from a given point of view) and one or more attributes, also called a component. An example of a component is the color component that can be expressed in different color spaces, for example, RGB (Red, Green and Blue) or YUV (Y being the luma and UV component two chrominance components). The point cloud is a representation of the object as seen from a given point of view, or a range of points of view. The point cloud can be obtained in different ways, for example:
[088] [088]  from a capture of a real object captured by camera equipment, optionally complemented by the active depth detection device;
[089] [089]  from a capture of a virtual / synthetic object captured by a virtual camera equipment in a modeling tool;
[090] [090]  from a mixture of both real and virtual objects.
[091] [091] Figure 2 shows an image that represents a three-dimensional scene that comprises a surface representation of several objects. The scene may have been captured using any suitable technology. For example, it may have been created using graphical computer interface (CGI) tools. It may have been captured by color and depth image acquisition devices. In such a case, it is possible that part of the objects that are not visible from the acquisition devices (for example, cameras) cannot be represented in the scene, as described in relation to Figures 3, 8A and 8B. The exemplary scene illustrated in Figure 2 comprises houses, two characters and a well. Cube 33 in Figure 2 illustrates a viewing space from which a user is likely to observe the 3D scene.
[092] [092] Figure 3 shows an exemplary arrangement of views in a scene, for example, 3D scene 20 in Figure 2. Figure 3 also shows the points in that 3D scene 20 that are visible from / according to different points of view of that provision. In order to be rendered and displayed by an immersive rendering device (for example, a basement or a Head Mounted Display (HMD) device), a 3D scene is considered from a first point of view, for example, the first point point of view 30. Point 31 of the scene, which corresponds to the first character's right elbow is visible from the first point of view 30, since there is no opaque object between the first point of view 30 and the point of the scene 31. On the contrary, point 32 of the 3D scene 20, which corresponds, for example, to the left elbow of the second character, is not visible from the first point of view 30, since it is occluded by points of the first character. For 3DoF rendering, only one point of view, for example, the first point of view 30, is considered. The user can rotate his head three degrees of freedom around the first point of view to watch different parts of the 3D scene, but the user cannot move the first point of view. The points of the scene to be encoded in the stream are points that are visible from that first point of view. There is no need to encode points in the scene that are not visible from that first point of view, since the user cannot access them by moving the first point of view.
[093] [093] Regarding 6DoF rendering, the user can move the viewpoint anywhere in the scene. In this case, it is valuable to code each point of the scene in the content stream, as each point is potentially accessible by a user who can move his point of view. In the coding stage, there is no way to know, a priori, from which point of view the user will observe the 3D scene 20.
[094] [094] Regarding 3DoF + rendering, the user can move the view within a limited space around a view, for example, around the first view 30. For example, the user can move his point of view inside a cube 33 centered on the first point of view 30. this allows to experience the parallax, as illustrated in relation to Figure 4. The data representative of the part of the scene visible from any point in the view space, for example , cube 33, must be encoded in the stream, including the data representative of the visible 3D scene according to the first point of view 30. The size and shape of the view space can, for example, be decided and determined at the coding and coded in the stream. The decoder obtains this information from the stream and the renderer limits the viewing space to the space determined by the information obtained. According to another example, the renderer determines the viewing space, according to hardware restrictions, for example, in relation to the capabilities of the sensor (or sensors) that detects the user's movements. In such a case, if, in the encoding phase, a point visible from a point within the renderer's view space has been encoded in the data stream, that point will not be rendered. According to an additional example, the data (for example, texture and / or geometry) representative of each point in the 3D scene is encoded in the stream without considering the rendering view space. To optimize the flow size, only a subset of the points in the scene can be coded, for example, the subset of points that can be viewed according to a rendering view space.
[095] [095] Figure 4 illustrates the parallax experience that is allowed by volumetric rendering (ie, 3DoF + and 6DoF). Figure 4B illustrates the part of the scene that a user can see from the first point of view 30 in Figure 3. From that first point of view, the two characters are in a given spatial configuration, for example, the left elbow of the second character (with a white shirt) is hidden by the body of the first character while his head is visible. When the user is rotating his head three degrees of freedom around the first point of view 30, this setting does not change. If the viewpoint is fixed, the second character's left elbow is not visible. Figure 4A illustrates the same part of the scene seen from a point of view on the left side of the viewing space 33 in Figure 3. From such a point of view, point 32 in Figure 3 is visible due to the parallax effect . Therefore, for a volumetric rendering, point 32 needs to be coded in the flow. If not coded, that point 32 will not be rendered. Figure 4C illustrates the same part of the scene seen from a point of view located on the right side of the viewing space 33 in Figure 3. From that point of view, the second character is almost completely hidden by the first character.
[096] [096] When moving the viewpoint within the 3D scene, the user can experience the parallax effect.
[097] [097] Figure 5 shows a texture image (also called a color image) that comprises the texture information (for example, RGB data or YUV data) of the points in the 3D scene 20 that are visible from the first point of view 30 of Figure 3, and this texture information is obtained according to an equirectangular projection mapping. The equirectangular projection mapping is an example of spherical projection mapping.
[098] [098] Figure 6 shows an image of the same points in the scene obtained or coded according to a cubic projection mapping. There are different types of cubic projection mappings. For example, the faces of the cube can be arranged differently in the image of Figure 6 and / or the faces can be oriented differently.
[099] [099] The projection mapping used to obtain / encode scene points visible from the determined point of view is selected, for example, according to compression criteria, or, for example, according to a standard option. It is known by a person skilled in the art that it is always possible to convert an image obtained by projecting a point cloud according to a projection mapping into an equivalent image of the same point cloud according to a different projection mapping. Such a conversion, however, may imply some loss in the resolution of the projection.
[0100] [0100] Figures 5 and 6 are shown in shades of gray. It is naturally understood that they are examples of texture (color) images (which encode the texture (color) of the points in the scene), for example, in RGB or YUV. Figures 5 and 6 comprise data necessary for a 3DoF rendering of the 3D scene. A decoder that receives a bit stream or data stream that comprises, in a first syntax element, an image like the exemplary images in Figures 5 and 6, decodes the image using a method correlated with the method used for the image encoding. The stream can be encoded according to standard image and video compression methods and standard format for image and video transport, for example, MPEG-2, H.264 or HEVC. The decoder can transmit the decoded image (or image sequence) to a 3DoF renderer or to a module for reformatting, for example. A 3DoF renderer can project the image onto a surface that corresponds to the projection mapping used in the coding (for example, a sphere for the image in Figure 5, a cube for the image in Figure 6). In a variant, the renderer converts the image, according to a different projection mapping before projecting it.
[0101] [0101] An image is compatible with 3DoF rendering when the image encodes points in a 3D scene, according to a projection mapping. The scene can comprise 360 ° points. The projection mappings commonly used to encode images compatible with 3DoF rendering are, for example, between spherical mappings: equirectangular projection or longitude / latitude projection, or different layouts of cubic projection mappings or pyramidal projection mappings.
[0102] [0102] Figure 7 shows a depth image (also called a depth map) of the 3D scene 20, according to the first point of view 30.
[0103] [0103] According to another embodiment, the depth of points visible from the determined point of view, for example point 30 of Figure 3, can be coded as a patch atlas. Figure 8A illustrates a portion of a patch depth atlas 83 for scene points projected onto color map 80 in Figure 5. A patch is an image obtained by grouping the projected points. A patch corresponds to a part of the projected points that define an area of adjacent pixels on the projection map and that are consistent with the depth. The part is defined by the angular range that the corresponding projected points occupy in space from the point of view. The patches are grouped on the projection map according to their connectivity and depth. An area P covers a set of adjacent pixels on the projection map where a projection occurred, and which is consistent with depth. The depth consistency check comes down to considering the distance Z between the point of view and each projected point covered by P and ensuring that the distance range of these pixels is not deeper than a T limit. This limit may depend on Zmax ( the maximum distance between the viewing point and the projected pixels covered by P), the dynamic D of the depth stored in the image generated by the additional generation operation and the perceptual properties. For example, this typical human visual acuity is about three minutes of arc. Determining the T limit, according to these criteria, has several advantages. On the one hand, an image patch on the image generated in the additional generation operation will cover a depth range consistent with the pixel depth resolution of the generated image (for example, 10 bits or 12 bits), and so on, then , be robust for compaction artifacts. On the other hand, the depth range is perceptually driven by the 3DoF + context.
[0104] [0104] Where VA is a value for visual acuity.
[0105] [0105] For example, patch 81 is obtained for the left arm of the first character. The depth encoding of part of the projected points of the scene is valuable as 2D values of the dynamic range are used to encode a short distance of a few decimeters, allowing for higher accuracy for depth encoding and higher robustness for artifacts from compaction.
[0106] [0106] The patches are arranged in an image 83, called patch atlas 83, with a certain angular resolution (for example, 3 seconds per pixel or 5 seconds per pixel) according to the size that the projection of the patch points will occupy in the patch atlas. The arrangement consists of reserving an area in the patch atlas to project (depth and color) the points associated with the patch. The size of the reserved area depends on the angular resolution of the image and the angular range of the patch. The location of areas in the frame is optimized to cover the image frame without overlap. A patch data item comprises mapping data from a depth patch packed into the depth patch atlas with a different colored pixel area in the color image. For example, a patch data item comprises coordinates of the left corner of the patch in the patch atlas, the width and height of the patch in the patch atlas, the left corner of the corresponding color pixels in the color image, the width and height of the color image area of the corresponding color pixels. In a variant, information from a patch data item is represented by the angular band data to facilitate localization in a color coded image, for example, according to a spherical projection mapping.
[0107] [0107] The points visible from a given (or determined) point of view are a part of the points of the 3D scene. To fully encode the 3D scene, residual points (that is, the point that was not encoded in the 3DoF compatible color image and the corresponding depth data) are encoded in the stream. Figure 9 shows the coding of such residual points as patches.
[0108] [0108] Figure 8B illustrates obtaining patches for a part of the 3D scene (for example, one of the characters in the 3D scene 20) that are packaged in a patch atlas 801, according to another non-limiting example of the present principles. The point cloud representing 3D object 8 is partitioned into a plurality of 3D parts, for example 50, 100, 1,000 or more 3D parts, 3 of which are illustrated in Figure 8B, that is, 3D parts 802, 803 and 804, the 3D part 804 comprising points of the point cloud representing part of the person's head, the 3D part 802 comprising points of the point cloud representing a person's armpit and the 3D part 803 comprising points of the cloud of points representing the person's hand. One or more patches from each 3D part or from a part of the 3D parts are generated to represent each 3D part in two dimensions, that is, according to a 2D parameterization. For example, a 2D parameterization 8001 is obtained for 3D part 804, a 2D parameterization 8002 is obtained for 3D part 802 and 2 different 2D parameterizations 8003 and 8004 are obtained for 3D part 803. 2D parameterization can vary from one part 3D to another. For example, the 2D parameterization 8001 associated with the 3D part 801 is a linear perspective projection while the 2D parameterization 8002 associated with the 3D part 802 is an LLE and the 2D parameterization 8003 and 8004 associated with the 3D part 803 are both orthographic projections according to different points of view. According to a variant, all 2D parameterizations associated with all 3D parts are of the same type, for example, a linear perspective projection or an orthographic projection. According to a variant, different 2D parameterizations can be used for the same 3D part.
[0109] [0109] A 2D parameterization associated with a given 3D part of the point cloud corresponds to a 2-dimensional navigation of the given 3D part of the point cloud allowing the sampling of the given 3D part, that is, a 2D representation of the content (ie , the point (or points)) of that given 3D part that comprises a plurality of samples (which may correspond to the pixels of a first image), the number of which depends on the sampling step that is applied. 2D parameterization can be obtained in several ways, for example, by implementing one of the following methods:
[0110] [0110] - Linear perspective projection of the points of the 3D part of the point cloud on a plane associated with a point of view, with the representative parameters of the linear perspective projection comprising the location of the virtual camera, the spatial sampling stage and the field of vision in 2 dimensions,
[0111] [0111] - orthographic projection of the points of the 3D part of the point cloud on a surface, with the representative parameters of the orthographic projection comprising the geometry (shape, size and orientation) of the projection surface and the spatial sampling step;
[0112] [0112] - LLE (Locally Linear Incorporation) which corresponds to a mathematical operation of dimension reduction, applied here to convert / transform from 3D to 2D, the representative parameters of the LLE that comprises the transformation coefficients.
[0113] [0113] Each patch advantageously has a rectangular shape to facilitate the packaging process in patch atlas 801. Patch atlas 801 can be a geometry patch atlas, that is, a pixel image comprising different patches 8011, 8012, 8014 (which can be seen as pixel arrays, for example), geometry information obtained by 2D projection / parameterization of the points of the associated 3D part that is associated with each pixel. Geometry information can correspond to depth information or information about the position of the vertices of a mesh element. A corresponding texture patch atlas that comprises the texture information associated with the 3D parts can be obtained in the same way.
[0114] [0114] The mapping information that links each 2D parameterization to its associated patch in the geometry patch atlas and in the texture patch atlas can be generated. Mapping information can be generated to maintain the connection between a 2D parameterization and the associated geometry patch and texture patch, respectively, in the geometry patch atlas and the texture patch atlas. The mapping information can be, for example, in the form of:
[0115] [0115] {2D parameterization parameters, geometry patch ID; Texture patch ID}
[0116] [0116] where the geometry patch ID can be an integer value or a pair of values comprising the column index U and row index V to which the geometry patch belongs in the patch matrix of the patch atlas. geometry; the texture patch ID can be an integer value or a pair of values comprising the column index U 'and the row index V' to which the texture patch belongs in the patch matrix of the texture patch atlas.
[0117] [0117] when geometry patches and texture patches are arranged according to the same arrangement in the geometry patch atlas and the texture patch atlas, the geometry patch ID and the texture patch ID are the same and the mapping information can be, for example, in the form of:
[0118] [0118] {2D parameterization parameters, geometry and texture patch ID}
[0119] [0119] where the geometry and texture patch ID identifies both the geometry patch in the geometry patch atlas and the texture patch in the texture patch atlas, through the same integer value associated with both the geometry patch and to the texture patch or through the pair of U column index values and the row V indica to which the geometry and texture patches belong, respectively, in the geometry patch atlas and in the texture patch atlas.
[0120] [0120] The same mapping information is generated for each 2D parameterization and associated geometry and texture patch. Such mapping information allows the reconstruction of the corresponding parts of the 3D scene by establishing the association of 2D parameterization with the corresponding geometry patch and texture patch. If the 2D parameterization is a projection, the corresponding part of the 3D scene can be reconstructed by deprojection (performing the reverse projection) of the geometry information included in the associated geometry patch and the texture information in the associated texture patch. The mapping information then corresponds to a list of mapping information:
[0121] [0121] {2D parameterization parameters; Geometry and texture patch ID} i,
[0122] [0122] For i = 1 to n, with n the number of 2D parameterizations.
[0123] [0123] Figure 10 illustrates an example of encoding, transmitting and decoding a sequence of 3D scenes in a format that is, at the same time, compatible 3DoF rendering and compatible volumetric rendering. A three-dimensional scene 100 (or a sequence of 3D scenes) is encoded in a stream 102 by an encoder 101. Stream 102 comprises a first element of syntax that carries representative data from a 3D scene to a 3DoF rendering and at least a second element syntax that holds data representative of the 3D scene for 3DoF + rendering. A decoder 103 obtains stream 102 from a source. For example, the source belongs to a set that comprises:
[0124] [0124] - a local memory, for example, a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk;
[0125] [0125] - a storage interface, for example, an interface with mass storage, a RAM, a flash memory, a ROM, an optical disk or a magnetic disk;
[0126] [0126] - a communication interface, for example, a wired interface (for example, a bus interface, a wide area network interface, a LAN interface) or a wireless interface (such as an IEEE interface
[0127] [0127] - a user interface, such as a Graphical User Interface that allows a user to enter data.
[0128] [0128] Decoder 103 decodes the first syntax element of stream 102 for 3DoF 104 rendering. For 3DoF + 105 rendering, the decoder decodes both the first syntax element and the second syntax element of stream 102.
[0129] [0129] Figure 11 shows a process for obtaining, encoding, formatting and / or encapsulating data representative of the 3D scene 20, according to a non-restrictive modality of the present principles.
[0130] [0130] In a 111 operation, the data associated with the elements (for example, points) of the 3D scene are acquired, and the data correspond to attributes associated with the elements of the scene, that is, texture attributes (color) and / or attributes of geometry. For example, a sequence of temporally successive images can be acquired. Texture attributes can be acquired with one or more photosensors and geometry attributes can be acquired, for example, with one or more depth sensors. According to a variant, the 3D scene is obtained with CGI (Computer Generated Images) technology. At least a part of the 3D scene is visible, according to a plurality of points of view, for example, according to a range of points of view that includes a first central point of view. According to a variant, the 3D scene is neither acquired nor generated through CGI, but retrieved from the cloud, a library of omnidirectional contents or any unit or storage device. An audio track associated with the 3D scene can also be optionally purchased.
[0131] [0131] In a 112 operation, the 3D scene is processed. The images of the 3D scene can be, for example, stitched if acquired with a plurality of cameras. During operation 112, it is signaled to a video encoder in which format the 3D scene representation can be encoded, for example, according to the H.264 standard or HEVC standard. During operation 112, it is additionally signaled which 3D to 2D transformation should be used to represent the 3D scene. The 3D to 2D transformation can be, for example, through one of the 2D parameterization examples or one of the projections described above.
[0132] [0132] In an operation 113, the sound information acquired with the first video, when any sound was acquired, is encoded in an audio track according to a determined format, for example, according to the AAC standard Advanced Audio), WMA (Windows Media Audio), MPEG-1/2 Audio Layer 3.
[0133] [0133] In an operation 114, the 3D scene data (that is, the attributes associated with the elements (elements or mesh points) are encoded in syntax elements or video strips of a bit stream, according to a format determined, for example, according to H.264 / MPEG-4 AVC: “Advanced video coding for generic audiovisual Services”, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, February 2014 or according to HEVC / H265: “ITU- T H.265 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2014), SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - Coding of moving video, High efficiency video coding , Recommendation ITU-T H.265 ”. For example, the texture information of the part of the 3D scene that is visible according to the first central point of view 30 is encoded in a first syntax element (or in a video track Geometry information (for example lo, depth images or depth patch atlases) of the parts of the 3D scene that are visible from the set of views 33 are encoded in a second syntax element (or in an additional video track). The texture information of the parts of the 3D scene visible from the views of the view set 33 that excludes the first view 30 (that is, the texture information that was not encoded in the first syntax element) is encoded in a third syntax element (or an additional video track).
[0134] [0134] According to a variant, the geometry and texture information are encoded in the same syntax element, that is, the second and third syntax elements form the same bit stream syntax element.
[0135] [0135] In an operation 115, the signaling information and metadata associated with the 3D to 2D transformation (or transformations) used to represent the 3D scene in two dimensions are encoded / formatted in a container, for example, container 13 that will be described in more detail in relation to Figure 13.
[0136] [0136] The stream (or streams) of bits obtained in operation 114 and 115 is stored in a memory device and / or transmitted to be decoded and processed, for example, to render the representative data of the 3D scene comprised in such a stream ( bit streams, as described in more detail in relation to Figure 12. The bit stream may comprise, for example, the encoded / formatted data in the container and the encoded data in the first, second and third syntax elements generated during operation 114.
[0137] [0137] Figure 12 shows a process of obtaining, decapsulating, decoding and / or interpreting data representative of the 3D scene 20 from the one or more bit streams obtained from the process of Figure 11, according to a particular modality of the present principles.
[0138] [0138] In an operation 121, the container obtained in operation 115 (whose example is shown in Figure 13) is interpreted and the data contained in that container are decapsulated and / or decoded to then decode the data encoded in the first, second and third syntax elements and / or audio tracks in operations 122 and 123.
[0139] [0139] In a 124 operation, a 3DoF representation of the 3D scene or a 3DoF + representation of the 3D scene is composed and optionally rendered using the decoded data from the container and the decoded data from the first syntax element (for the 3DoF representation) or the decoded data of the first, second and third syntax elements (for the 3DoF + representation).
[0140] [0140] In an optional additional operation, the rendered 3D scene can be displayed on a display device, such as an HMD, or stored on a memory device.
[0141] [0141] In an optional additional operation, the audio information is rendered from the audio tracks decoded for storage on a memory device or rendered using a speaker (speakers).
[0142] [0142] Figure 13 shows a non-restrictive example of the syntax of a container 13. Container 13 corresponds, for example, to an ISOBMFF file (ISO Base Media File Format, ISO / IEC 14496-12-MPEG-4 Part 12) which comprises the following elements:
[0143] [0143] - a first video track 131 that comprises signaling information with metadata to generate 3D points of the 3D scene from the texture data encoded in the first syntax element in operation 114. The first video track can comprise, for example , a 1311 sequence of frame samples each comprising metadata that describe parts of the texture data encoded in the first syntax element. A timestamp can be associated with each sample of data, a sample frame being, for example, associated with a 3D scene image at time t or with a group of images (GOP). The metadata and signaling information comprised in the first video track 131 allows to obtain a 3D representation of the scene in combination with the texture data encoded in the first syntax element, the 3D scene being reconstructed according to the single first point of view, for a 3DoF rendering of the scene;
[0144] [0144] - a second video track 132 comprising signaling information with metadata that allows the reconstruction of the 3D scene geometry from the geometry data encoded in the second syntax element in operation 114. The second video track 132 can comprise , for example, a sequence of 1321 frame samples, each of which comprises metadata describing parts of the geometry data encoded in the second syntax element. A timestamp can be associated with each data sample, a sample frame being, for example, associated with a 3D scene image at time t or with a group of images (GOP);
[0145] [0145] - a third video track 133 that comprises signaling information with metadata that allows the reconstruction of the texture of the 3D scene from the texture data encoded in the third syntax element in operation 114, for the views of the points of view different from the first point of view. The third video track 133 may comprise, for example, a sequence 1331 of sample frames, each of which comprises metadata describing part of the texture data encoded in the third syntax element.
[0146] [0146] - a fourth track 134 comprising timed metadata (for example, deprojection parameters) that can be used in association with the data comprised in the first video track 131, the second video track 132 and the third video track 133 .
[0147] [0147] A 3DoF + rendering of the 3D scene (ie, with parallax) uses the four tracks 131 to 134 while a single 3DoF rendering of the scene uses only the first track 131, which allows the decoder and renderer that are not compatible with a 3DoF + rendering (or 6DoF) to interpret, decode and render the data representative of the 3D scene. Formatting the data according to the format described above in this document allows a decoding / rendering of the 3D scene according to 3DoF or 3DoF + from the same file / container, depending on the capabilities of the decoder / renderer. Such a file / container format allows compatibility with previous versions of 3DoF + content with a 3DoF receiver.
[0148] [0148] The second and third video tracks 132, 133, which carry the necessary 3D geometry and texture data, allow the 3DoF + presentation: the 3DoF + geometry band carries the projected geometry maps, and the projected 3DoF + texture band carries the projected texture maps. A deprojection mechanism is specified to map pixels from rectangular video frames over 3D point cloud data. A so-called Specific Multiple Offset Equirectangular Projection (MS-ERP) can be defined as 3D to 2D projection, however other alternative projection mechanisms can be implemented. The MS-ERP combines a set of equirectangular projections on spheres displaced from the central point of view (that is, the first point of view 30) and with different orientations. According to a variant, an additional second video track can be used to carry the mapping information between the patch atlas patches (geometry and texture) and the corresponding 2D parameterization and the associated 3D part of the 3D scene, especially when the geometry patches and texture patches are arranged in the geometry patch atlas and texture patch atlas, respectively.
[0149] [0149] According to a variant, the metadata included in the fourth track 134 are not encapsulated in container 13. According to this variant, the metadata in the fourth track 134 is transmitted in band, with the same structure.
[0150] [0150] The second video track 132 defined in the present disclosure contains the geometry information related to 3DoF + elements. A possible modality of such geometry information is to use a video organized in sub-regions for which each of these regions contains a depth map, mask information and point of view. For some content, parts of the geometry information (such as point of view information) remain static throughout the content, the present invention allows the signaling of such static information in static ISOBMFF boxes, however, it also allows the sending of such information in a timed metadata track, if they are dynamically changed at some point in the content.
[0151] [0151] Similar to what is done for the first video track 131, a restricted video scheme is defined for the second video track 132 (for example, here the type of scheme 'p3pg', for the projected 3DoF + geometry) that contains a single new box (for example here, Projected3DoFplusGeometryBox) that carries the following information:
[0152] [0152]  the projection format of the projected geometry map;
[0153] [0153]  static deprojection parameters for the projection format;
[0154] [0154]  a flag that indicates that there are other deprojection parameters that are temporally dynamic.
[0155] [0155] The use of the 3DoF + geometry video scheme for the restricted visual sample input type 'resv' indicates that the decoded images are projected geometry map images. The use of the projected 3DoF + geometry scheme is indicated by scheme_type equal to 'p3pg' within the SchemeTypeBox. The format of the projected geometry map images is indicated with the Projected3DoFplusGeometryBox contained within the SchemelnformationBox.
[0156] [0156] An illustrative example of ISOBMFF syntax for these elements is: 3DoF + geometry box projected Box Type: 'p3pg' Container: Scheme Information box ('schi') Mandatory: Yes, when scheme_type is equal to 'p3pd' Quantity: Zero or one aligned (8) class Projected3DoFplusGeometryBox extends Box ('p3pg') {ProjectionFormat3DoPplusBox (); // mandatory Box [] other_boxes; // optional} aligned (8) class ProjectionFormat3DoPplusBox () extends ProjectionFormatBox ('pf3d') {bit (7) reserved = 0;
[0157] [0157] With the following semantics:
[0158] [0158] projection_type (defined in OMAF ProjectionFormatBox (Study of ISO / IEC DIS 23000-20 Omnidirectional Media Format, ISO / IEC JTC1 / SC29 / VVG11 N16950, July 2017, Torino, Italy), whose syntax is reused through the box extension) indicates the particular mapping of the rectangular decoder image output samples over the 3D coordinate system; projection_type equal to 0 indicates multiple displaced equirectangular projection (MS-ERP).
[0159] [0159] static_flag equal to 0 indicates that the projection parameters are dynamically updated over time. In this case, a timed metadata track for the current video track is mandatory to describe dynamic deprojection parameters. When projection_type is equal to 0, static_flag must be equal to 0.
[0160] [0160] ShiftedViewpointsGeometry specifies all the views used by the MS-40 ERP projection and their relative positions in relation to the central viewpoint (that is, the origin of the global coordinate system).
[0161] [0161] num_viewpoints indicates the number of points of view, distinct from the central point of view, that are used by the MS-ERP projection; num_viewpoints values in the range 0 to 7.
[0162] [0162] radius is a 16.16 fixed point value that specifies the distance from the origin of the global coordinate system.
[0163] [0163] static_viewpoints_geometry_flag equal to 0 indicates that the number and geometry of additional views used by the MS-ERP projection are dynamically updated over time. In this case, the ShiftedViewpointsGeometry instances in the timed metadata range for the current video track take precedence over the static instances defined in the schema information box.
[0164] [0164] The third video track 133 defined in the present disclosure contains the texture information related to 3DoF + elements.
[0165] [0165] Similar to what is done for the first video track 131, here we define a restricted video scheme for the projected 3DoF + texture video (for example, here, the type of scheme 'p3pt') that contains a single new box (for example, here, Projected3DoFplusTextureBox) that carries the following information:
[0166] [0166]  the projection format of the projected texture
[0167] [0167]  static deprojection parameters for the projection format;
[0168] [0168]  a flag that indicates that there are other deprojection parameters that are temporally dynamic.
[0169] [0169] The use of the 3DoF + texture video scheme designed for the visual sample input type 'resv' indicates that the decoded images are projected images that contain texture content from parts of the scene not seen from the center point, but not covered in a 3DoF + experience. The use of the 3DoF + projected texture scheme is indicated by scheme_type equal to 'p3pt' within the SchemeTypeBox. The format of the projected texture images is indicated with the Projected3DoFplusTextureBox contained within the SchemelnformationBox.
[0170] [0170] An ISOBMFF syntax proposed for this element is: 3DoF + projected texture box Box Type: p3pt: Container: Scheme Information box ('schi') Mandatory: Yes, when schem_type is equal to 'p3pt' Quantity: Zero or one aligned (8) class Projected3DoFplusTextureBox extends Box ('p3pt') {ProjectionFormat3DoFplusBox (); // mandatory Box [] other boxes; // optional}
[0171] [0171] where Projected3DoFplusTextureBox is the same box as in the 3DoF + geometry video strip.
[0172] [0172] The first 131 (3DoF) video track, the 3DoF + geometry video track and the 3DoF + texture video track, must be associated, except for the first 131 video track, they are not independent tracks. The second and third video tracks can be contained in the same group of ISOBMFF tracks. For example, TrackGroupTypeBox with track_group_type equal to '3dfp' indicates that this is a group of tracks that can be processed to obtain images suitable for a 3DoF + visual experience. The tracks mapped to that grouping (ie tracks that have the same track_group_id value within TrackGroupTypeBox 20 with track_group_type equal to '3dfp') collectively represent, when combined with a projected omnidirectional video track (3DoF), visual content 3DoF + that can be displayed.
[0173] [0173] One or more of the following restrictions may apply to the bands mapped to that grouping:
[0174] [0174]  This grouping must consist of at least two video tracks with sample entry type equal to 'resv': at least one with a scheme_type that identifies a 3DoF + geometry video track (for example, 'p3pd' , here) and one with a scheme_type that identifies a 3DoF + textured video track (for example, 'p3pt', here);
[0175] [0175]  The contents of the ProjectionFormat3DoFplusBox instances included in the sample entries of the 3DoF + geometry map ('p3pg') and texture map video strips ('p3pt') must be identical;
[0176] [0176]  When static_flag within the ProjectionFormat3DoFplusBox is equal to 0, a timed metadata track ('dupp') that describes the dynamic deprojection parameters must be present in the container box 'rnoov' 13 and linked to the 3DoF + 135 track group (with a track reference 'cdtg').
[0177] [0177] Although some of the deprojection parameters are static and can be described in the 3DoF + geometry and texture ranges (that is, the second and third video strips 132, 133), part of the deprojection parameters for 3DoF + content is dynamic . Such dynamic deprojection parameters can be transmitted in a timed metadata track, that is, the fourth track 134, associated with the first, second and third video tracks 131, 132, 133.
[0178] [0178] According to a non-restrictive modality, a sample input of metadata sample of the type `dupp '(for dynamic deprojection parameters) for deprojection parameters can be defined as described below: Sample Entry Type:' dupp 'Container: Sample Description Box ('stsd') Mandatory: No Quantity: Zero or one class UnprojectionParametersSampleEntry () extends MetadataSampleEntry ('dupp') {}
[0179] [0179] Each metadata sample contains all the required information necessary to perform the projection of all parts (3D patches) of the volumetric video from the omnidirectional video (3DoF), 3DoF + projected geometry video and 3DoF + projected texture video, this is the first, second and third video tracks 131, 132 and 133.
[0180] [0180] Projections of 3D patch data onto its associated projection surface produce a collection of 2D regions in irregular shape, whose rectangular connection boxes are additionally mapped onto a packaged image by indicating their locations, orientations and sizes. The texture and geometry data is packed in separate images. The sequence of the texture packed images and the packed geometry images form the projected 3DoF + texture atlas map and the projected 3DoF + geometry atlas map respectively.
[0181] [0181] A packaging structure inspired by the packaging structure by region defined in OMAF (Study of ISO / IEC DIS 23000-20 Omnidirectional Media Format, ISO / IEC JTC1 / SC29 / VVG11 N16950, July 2017, Torino, Italy ), but maintaining only useful parameters (number of regions and for all regions: guard band information, optional transformation, position and size) can be generated. Compared to the packaging structure by region, the number of regions also needs to be extended, as atlases are expected to use more than 256 regions.
[0182] [0182] Each sample specifies a list of 3D patches. Each 3D patch describes a portion of the 3D scene volume (spherical strip) and attaches to the storage structure of the texture and geometry data designed for that patch. That includes:
[0183] [0183]  Information on all points of view from which 3D patches are viewed. If this information is static (and therefore flagged in the 3DoF + geometry and texture range), then a flag must also be present in the timed metadata to indicate this.
[0184] [0184]  Information on the organization / packaging of all 2D rectangular patches in the geometry video. This is referred to in this invention as the 3DoF + geometry atlas map.
[0185] [0185]  Information on the organization / packaging of all 2D rectangular patches in the texture video. This is referred to in this invention as the 3DoF + texture atlas map.
[0186] [0186]  Information about the number of 3D patches and for each 3D patch information about:
[0187] [0187] o the 3D volume described by the 3D patch identified by minimum and maximum values for yaw, pitch and roll angles,
[0188] [0188] which point of view (and possibly with a different orientation)
[0189] [0189] o the identification of the patch on the 3DoF + geometry map,
[0190] [0190] o the patch identification on the 3DoF + texture map (that is, the third video track 133) or the first video track 131.
[0191] [0191] One possible ISOBMFF modality for the sample metadata format is as follows: aligned (8) UnprojectionParametersSample () {bit (7) reserved = 0; unsigned int (1) static_viewpointsgeometry flag; if (static viewpoints_geometry flag == 0) ShiftedViewpointsGeometry (); // number and locations of // shifted viewpoints unsigned int (16) num_3Dpatches; for (i = 0; i <num_3Dpatches; i ++) Patch Struct (); PatchAtlasPackingStruct (); // texture atlas map PatchAtlasPackingStruct (); // geometry atlas map} aligned (8) class PatchStruct () extends SphericalRange () {bit (3) reserved = 0; unsigned int (3) sphere_id; unsigned int (2) orientation_id; bit (7) reserved = 0; unsigned int (1) omnidirectional_compatible_flag; if (omnidirectional_compatible_flag == 0) unsigned int (16) texture_atlas_region_id; bit (6) reserved = 0; unsigned int (16) geometry_atlas_region_id;
[0192] [0192] where:
[0193] [0193] static_viewpoints_geometry_flag indicates that the number and locations of displaced points of view used by the MS-ERP projection are static and should be found in ProjectionFormat3DoFplusBox.
[0194] [0194] num_3Dpatches specifies the number of 3D patches.
[0195] [0195] sphericalRange specifies (in spherical coordinates) the 3D volume described by the patch:
[0196] [0196]  yaw_min and yaw_max specify the minimum and maximum yaw angles, in units of 180 * 2-16 degrees in relation to the projection sphere coordinate axes; they must be in the range of -216 to 216-1, inclusive (ie ± 180 °);
[0197] [0197]  pitch_min and pitch_max specify the minimum and maximum pitch angles, in units of 180 * 2-16 degrees in relation to the projection sphere coordinate axes; they must be in the range of -215 to 215, inclusive (ie ± 90 °);
[0198] [0198]  rho_min and rho_max are 16.16 fixed point values that specify the minimum and maximum radii in relation to the projection sphere coordinate axes (in meters).
[0199] [0199] omnidirectional_compatible_flag indicates that the patch texture content is found in the first video track;
[0200] [0200] sphere_id values are in the range 0 to 7:
[0201] [0201]  sphere_id equal to 0 indicates that the projection sphere used for the first video track (centered on the origin of the scene coordinate system) is used; if omnidirectional_compatible_flag is equal to 1 sphere_id it must be equal to 0; if omnidirectional_compatible_flag is equal to 0, then sphere_id must not be equal to 0;
[0202] [0202]  sphere_id values in the range 1 to num_viewpoints indicates which of the additional MS-ERP projection spheres num_viewpoints is used; the patch texture content is found in the projected 3DoF + texture video track;
[0203] [0203] orientation_id specifies the orientation of the coordinate axes of the current MS-ERP projection sphere:
[0204] [0204] orientation orientation_id values in the range 1 to 3 correspond to 3 different orientations;
[0205] [0205]  orientation_id must be equal to 0 when sphere_id is equal to 0. PatchAtlasPackingStruct specifies such a rectangular region layout. The first instance of PatchAtlasPackingStruct in UnprojectionParametersSample specifies the texture patch packaging arrangement, the second instance describes the geometry patch packaging arrangement.
[0206] [0206] texture_atlas_region_id specifies the index of the rectangular region in the packaged texture image (texture patch atlas).
[0207] [0207] geometry_atlas_region_id specifies the index of the rectangular region in the packaged geometry image (geometry patch atlas).
[0208] [0208] Figure 14 shows an example of a flow syntax modality when data is transmitted using a packet-based transmission protocol. Figure 14 shows an exemplary structure 14 of a volumetric video stream. The structure consists of a container that organizes the flow into independent syntax elements. The structure may comprise a header part 141 which is a set of data common to all syntax elements of the stream. For example, the header part comprises metadata about syntax elements, which describe the nature and function of each one. The header part can also comprise the coordinates from the point of view used for encoding the first color image for 3DoF rendering and information about the size and resolution of the images. The structure comprises a payload comprising a first syntax element 142 and at least a second syntax element 143. The first syntax element 142 comprises data representative of the first color image prepared for a 3DoF rendering, which corresponds to the first video track associated with the texture data encoded in the first syntax element obtained in operation 114.
[0209] [0209] The one or more second syntax elements 143 comprise geometry information and texture information associated with the second and third video tracks and the respective second and third syntax elements of the encoded data obtained in operation 114.
[0210] [0210] For the purpose of illustration, in the context of the ISOBMFF file format standard, the texture map, geometry map and metadata can typically be referenced in ISOBMFF bands in a moov type box, with the texture data itself and geometry data embedded in mdat media data box.
[0211] [0211] Figure 15 shows an exemplary architecture of a device 15 that can be configured to implement a method described in relation to Figures 11, 12 16 and / or 17. Device 15 can be configured to be an encoder 101 or a decoder 103 of Figure 10.
[0212] [0212] Device 15 comprises the following elements that are linked together by a data bus and address 151:
[0213] [0213] - a microprocessor 152 (or CPU), which is, for example, a DSP (or Digital Signal Processor);
[0214] [0214] - one ROM (or Read Only Memory) 153;
[0215] [0215] - a RAM (or Random Access Memory) 154;
[0216] [0216] - a storage interface 155;
[0217] [0217] - an I / O 156 interface for receiving data for transmission, from an application; and
[0218] [0218] - a power source, for example, a battery.
[0219] [0219] According to an example, the power supply is external to the device. In each of the aforementioned memories, the word "record" used in the specification can correspond to the small capacity area (some bits) or the very large area (for example, an entire program or a large amount of data received or decoded). ROM 153 comprises at least one program and parameters. ROM 153 can store algorithms and instructions for performing techniques, in accordance with these principles. When turned on, CPU 152 uploads the program to RAM and executes the corresponding instructions.
[0220] [0220] RAM 154 comprises, in a register, the program executed by CPU 152 and uploaded after the device 150 is connected, input data in a register, intermediate data in different method states in a register, and other variables used to execute the method on a record.
[0221] [0221] The implementations described in this document can be implemented, for example, in a method or a process, an apparatus, a computer program product, a data stream or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of discussed features can also be implemented in other ways (for example, a program). A device can be implemented, for example, in suitable hardware, software and firmware. The methods can be implemented, for example, in an apparatus, such as, for example, a processor, which refers to processing devices, in general, which include, for example, a computer, a microprocessor, an integrated circuit or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable / personal digital assistants ("PDAs"), and other devices that facilitate the communication of information between end users.
[0222] [0222] According to an encoding example or an encoder 101 of Figure 10, the three-dimensional scene 20 is obtained from a source. For example, the source belongs to a set that comprises:
[0223] [0223] - a local memory (153 or 154), for example, a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read-Only Memory), a hard disk;
[0224] [0224] - a storage interface (155), for example, an interface with mass storage, a RAM, a flash memory, a ROM, an optical disk or a magnetic disk,
[0225] [0225] - a communication interface (156), for example, a wired interface (for example, a bus interface, a wide area network interface, a LAN interface) or a wireless interface (such as an IEEE 802.11 interface or a Bluetooth® interface); and
[0226] [0226] - a user interface, such as a Graphical User Interface that allows a user to enter data.
[0227] [0227] According to examples of decoding or decoder (or decoders) 103 in Figure 10, the stream is sent to a destination; specifically, the destination belongs to a set comprising:
[0228] [0228] - a local memory (153 or 154), for example, a video memory or a RAM, a flash memory, a hard disk;
[0229] [0229] - a storage interface (155), for example, an interface with mass storage, a RAM, a flash memory, a ROM, an optical disk or a magnetic disk, and
[0230] [0230] - a communication interface (156), for example, a wired interface (for example, a bus interface, (for example USB (or Universal Serial Bus)), a wide area network interface, an interface LAN interface, an HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as an IEEE 802.11 interface, WiFi ® or a Bluetooth ® interface).
[0231] [0231] According to coding or encoder examples, a bit stream comprising data representative of the volumetric scene is sent to a destination. As an example, the bit stream is stored in a local or remote memory, for example, a video memory or a RAM, a hard drive. In a variant, the bit stream is sent to a storage interface, for example, an interface with mass storage, a flash memory, ROM, an optical disk or a magnetic medium and / or transmitted through a communication interface , for example, an interface with a point to point link, a communication bus, a point link to multiple points or a broadcast network.
[0232] [0232] According to examples of decoding or decoding or rendering 103 in Figure 10, the bit stream is obtained from a source. For example, the bit stream is read from local memory, for example, video memory, RAM, ROM, flash memory or a hard disk. In a variant, the bit stream is received from a storage interface, for example, an interface with mass storage, RAM, ROM, flash memory, optical disk or magnetic and / or received media through a communication interface, for example, an interface with a point-to-point link, a bus, a point-to-multiple link or a broadcast network.
[0233] [0233] According to the examples, device 15 is configured to implement a method described in relation to Figures 11, 12, 16 and / or 17, and belongs to a set comprising:
[0234] [0234] - a mobile device;
[0235] [0235] - a communication device;
[0236] [0236] - a gaming device;
[0237] [0237] - a tablet (or tablet type computer);
[0238] [0238] - the laptop;
[0239] [0239] - a still image camera;
[0240] [0240] - a video camera;
[0241] [0241] - an encoding chip;
[0242] [0242] - a server (for example, a broadcast server, a video on demand server or a web server).
[0243] [0243] Figure 16 illustrates a method for encoding data representative of a 3D scene, for example, 3D scene 20, according to a non-restrictive modality of the present principles. The method can be, for example, implemented in the encoder 101 and / or in the device 15. The different parameters of the device 15 can be updated. The 3D scene can be, for example, obtained from a source, one or more points of view can be determined in the 3D scene space, parameters associated with projection mapping (or mappings) can be initialized.
[0244] [0244] In a first operation 161, the first data representative of the texture of the 3D scene is encoded or formatted in a first video track of a container or a file. The first data refers to the parts (for example, points or mesh elements) of the 3D scene that are visible from a single first point of view. The first data comprises, for example, metadata and signaling information that points to a first bit stream syntax element that comprises the texture information encoded in pixels of patches or images of the 3D scene, obtained, for example, by transformation 3D to 2D (for example, an equirectangular projection of the 3D scene onto patches or images, each patch or image being associated with a part of the 3D scene). The metadata encoded in the first video track comprises, for example, the parameters of the 3D to 2D transformation or the parameters of the reverse transformation (2D to 3D). The first data, once decoded or interpreted, allows to obtain a 3DoF representation of the 3D scene, according to the first point of view, that is, a representation without parallax.
[0245] [0245] In a second operation 162, the second data representative of the 3D scene geometry is encoded or formatted in a second video track of the container or file. Second data refers to the parts (for example, points or mesh elements) of the 3D scene that are visible according to a set (or range) of views that include the first view. The second data comprises, for example, metadata and signaling information that points to a second bit stream syntax element that comprises the geometry information encoded in pixels of patches or images of the 3D scene, obtained, for example, by 3D transformation for 2D (for example, an equirectangular projection of the 3D scene onto patches or images, each patch or image being associated with a part of the 3D scene). The metadata encoded in the second video track comprises, for example, the parameters of the 3D to 2D transformation or the parameters of the reverse transformation (2D to 3D).
[0246] [0246] In a third operation 163, the third data representative of the texture of at least part of the 3D scene is encoded or formatted in a third video track of the container or file. The third data refer to the parts (for example, points or mesh elements) of the 3D scene that are visible, according to the views of the whole without the part of the scene visible, according to the first point of view. The third data comprises, for example, metadata and signaling information that point to a third bit stream syntax element that comprises the texture information encoded in the pixels of patches or images of said parts of the 3D scene visible from the points of view. view of the set that excludes the first point of view, with patches (or images) being, for example, obtained by 3D to 2D transformation (for example, an equirectangular projection of the 3D scene onto patches or images, each patch or image is associated with a part of the 3D scene). The metadata encoded in the third video track comprises, for example, the parameters of the 3D to 2D transformation or the parameters of the reverse transformation (2D to 3D).
[0247] [0247] In a fourth operation 164, metadata is encoded in a fourth track. Metadata is associated with the second data and the third data and allows a 3DoF + representation of the 3D scene together with the first, second and third video tracks (and associated data encoded in the first, second and third bit stream syntax elements) . Metadata comprises information representative of one or more projections used to obtain the second and third data, for example, from one point of view to the other.
[0248] [0248] Metadata comprises at least one (or any combination) of the following information:
[0249] [0249] - information representative of at least one point of view associated with at least one projection used to obtain the geometry and texture;
[0250] [0250] - information representative of a package of 2D geometry patches, each geometry patch being associated with the projection of a part of the 3D scene;
[0251] [0251] - information representative of a 2D texture patch package, each texture patch being associated with the projection of a part of the 3D scene;
[0252] [0252] - information representing various patches, each patch being associated with a part of the 3D scene and associated with an identifier in the second and first video tracks or in the third video track.
[0253] [0253] According to a variant, the first, second and third syntax elements of the first, second and third video tracks refer, respectively, to the fact that they are encapsulated in the same container as the first, second and third video tracks. According to a variant, the data of the first, second and third syntax elements are encapsulated in a file other than the file (or container) that comprises the data or metadata of the first, second, third and fourth tracks, with all tracks are transmitted in a single bit stream.
[0254] [0254] The second data comprises, for example, a first information representative of a projection format used to obtain the geometry, the projection parameters and a flag that indicates whether at least some of the projection parameters are dynamically updated. When the flag indicates that the parameters are dynamically updated, an analyzer can retrieve the updated parameters from the fourth range.
[0255] [0255] The third data comprises, for example, a second information representative of a projection format used to obtain the geometry, the projection parameters and a flag that indicates whether at least some of the projection parameters are dynamically updated. When the flag indicates that the parameters are dynamically updated, an analyzer can retrieve the updated parameters from the fourth range.
[0256] [0256] According to or variant, the first video track and at least a second video track are grouped in the same group of tracks when the first information and the second information are identical.
[0257] [0257] Figure 17 illustrates a method for decoding data representative of a 3D scene, for example, 3D scene 20, according to a non-restrictive modality of the present principles. The method can be, for example, implemented in the encoder 101 and / or in the device 15.
[0258] [0258] In a first operation 171, the first data representative of the texture of the part of the 3D scene that is visible, according to a first point of view, are decoded or interpreted from a first video track of a received container, the container being, for example, included in a bit stream.
[0259] [0259] In a second operation 172, the second data representative of the geometry of the 3D scene that is visible, according to the set of points of view that comprise the first point of view, are decoded or interpreted from a second video track of the received container.
[0260] [0260] In a third operation 173, the third data representative of the texture of the part (or parts) of the 3D scene that is visible from the points of view of the set that excludes the first point of view are decoded or interpreted from a third video track of the container.
[0261] [0261] In a fourth operation 174, metadata is decoded or interpreted from a fourth track of the container. Metadata is associated with the second data and the third data and allows a 3DoF + representation of the 3D scene together with the first, second and third video tracks (and associated data encoded in the first, second and third bit stream syntax elements) . Metadata comprises information representative of one or more projections used to obtain the second and third data.
[0262] [0262] Naturally, the present disclosure is not limited to the previously described modalities.
[0263] [0263] In particular, the present disclosure is not limited to a method and device for encoding / decoding data representative of a 3D scene, but also extends to a method for generating a bit stream comprising the encoded data and any device that implements this method and, notably, any devices that comprise at least one CPU and / or at least one GPU.
[0264] [0264] The present disclosure also refers to a method (and a configured device) for displaying rendered images from the decoded data from the bit stream.
[0265] [0265] The present disclosure also relates to a method (and a configured device) for transmitting and / or receiving the bit stream.
[0266] [0266] The implementations described in this document can be implemented, for example, in a method or a process, an apparatus, a software program, a data flow or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of discussed features can also be implemented in other ways (for example, a program). A device can be implemented, for example, in suitable hardware, software and firmware.
[0267] [0267] Implementations of the various processes and features described in this document can be incorporated into a variety of different devices or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, data processing. texture, and other image processing and related texture information and / or depth information. Examples of such equipment include an encoder, a decoder, a post-processor that processes output from a decoder, a pre-processor that provides input to an encoder, a video encoder, video decoder, a video codec, a web server, a set-top box, a laptop computer, a personal computer, a cell phone, a PDA and other communication devices. As must be clear, the equipment can be mobile and even installed in a mobile vehicle.
[0268] [0268] Additionally, the methods can be implemented by instructions that are performed by a processor, and such instructions (and / or data values produced by an implementation) can be stored in a processor-readable medium, such as, for example, an integrated circuit, a software carrier or other storage device, such as, for example, a hard disk, a compact diskette (“CD”), an optical disk (such as, for example, a DVD, often called a a digital versatile disc or a digital video disc), a random access memory (“RAM”) or a read-only memory (“ROM”). Instructions can form an embedded application program tangibly in a readable medium per processor. The instructions can be found, for example, in hardware, firmware, software or a combination of them. Instructions can be found, for example, in an operating system, a separate application, or in a combination of the two. A processor can be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) that has instructions for carrying out a process.
[0269] [0269] As will be evident to a person skilled in the art, implementations can produce a variety of signals formatted to carry information that can, for example, be stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal can be formatted to carry the rules for writing or reading the syntax of a described modality as given, or to carry as data the actual syntax values written by a described modality. Such a signal can be formatted, for example, as an electromagnetic wave (for example, with the use of a portion of radio frequency spectrum) or as a baseband signal. formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries can be, for example, analog or digital information. The signal can be transmitted through a variety of different wired or wireless links, as is known. The signal can be stored in a readable medium per processor.
[0270] [0270] Several implementations have been described. However, it must be understood that several modifications can be made. For example, elements from different implementations can be combined, supplemented, modified or removed to produce other implementations. In addition, a person of ordinary skill in the art will understand that other structures and processes can be replaced by those disclosed and the resulting implementations will perform at least substantially the same function (or functions), at least substantially in the same way (or modes), to achieve at least substantially the same result (or results) as the revealed implementations.
Consequently,
these and other implementations are covered by this request.
权利要求:
Claims (17)
[1]
1. Method CHARACTERIZED by the fact that it comprises:  encoding, in a first video track of a container, the first data representative of the texture of said 3D scene visible from a first point of view; - encoding, in at least a second video track of said container, second data representative of the geometry of said 3D scene visible from a set of points of view and from said first point of view;  encode, in a third video track of said container, third data representative of the texture of said 3D scene visible only from the points of view of said set; and  encode metadata in said container, said metadata comprising information representative of at least one projection used to obtain said second and third data.

[2]
2. Method, according to claim 1, CHARACTERIZED by the fact that said second data comprise a first information representative of a projection format used to obtain the geometry, parameters of said projection and a flag that indicates whether at least one projection parameters is dynamically updated.
[3]
3. Method, according to claim 1 or 2, CHARACTERIZED by the fact that said third data comprise a second information representative of a projection format used to obtain the geometry, parameters of said projection and a flag that indicates whether at least least one of the projection parameters is dynamically updated.
[4]
4. Method according to one of claims 1 to 3, CHARACTERIZED by the fact that said metadata comprises at least one of the following information:
 information representative of at least one point of view associated with at least one projection used to obtain said geometry and texture;  information representative of a rectangular 2D geometry patch package, with each geometry patch associated with the projection of a part of the said 3D scene;  information representative of a rectangular 2D texture patch package, each texture patch being associated with the projection of a part of said 3D scene;  information representative of various 3D patches, each 3D patch being associated with a part of the 3D scene and associated with an identifier on said second and said first video track or on said third video track.
[5]
5. Device CHARACTERIZED by the fact that it comprises a processor configured to: ificar encode, in a first video track of said container, the first data representative of the texture of a 3D scene visible from a first point of view; - encode, in at least a second video track of said container, second data representative of the geometry of said visible 3D scene, according to the set of points of view that comprise said first point of view;  encode, in a third video track of said container, third data representative of the texture of said 3D scene visible only from the points of view of said set; and  encode metadata in said container, said metadata comprising information representative of at least one projection used to obtain said second and third data.
[6]
6. Device, according to claim 5, CHARACTERIZED by the fact that said second data comprise a first information representative of a projection format used to obtain the geometry,
said projection parameters and a flag indicating whether at least one of the projection parameters is dynamically updated.
[7]
7. Device, according to claim 5 or 6, CHARACTERIZED by the fact that said third data comprise a second information representative of a projection format used to obtain the geometry, parameters of said projection and a flag that indicates whether at least least one of the projection parameters is dynamically updated.
[8]
8. Device according to one of claims 5 to 7, CHARACTERIZED by the fact that said metadata comprises at least one of the following information:  information representative of at least one point of view associated with at least one projection used to obtain the said geometry and texture;  information representative of a rectangular 2D geometry patch package, with each geometry patch associated with the projection of a part of the said 3D scene;  information representative of a rectangular 2D texture patch package, each texture patch being associated with the projection of a part of said 3D scene;  information representative of various 3D patches, each 3D patch being associated with a part of the 3D scene and associated with an identifier on said second and said first video track or on said third video track.
[9]
9. Method CHARACTERIZED by the fact that it comprises:  decoding, from a first video track of said container, the first data representative of the texture of said 3D scene visible from a first point of view;
- decode, from at least a second video track of said container, second data representative of the geometry of said 3D scene visible from a set of points of view and from said first point of view;  decode, from a third video track of said container, third data representative of the texture of said 3D scene visible only from the points of view of said set; and  decoding metadata in said container, said metadata comprising information representative of at least one projection used to obtain said second and third data.
[10]
10. Method, according to claim 9, CHARACTERIZED by the fact that said second data comprise a first information representative of a projection format used to obtain the geometry, parameters of said projection and a flag that indicates whether at least one projection parameters is dynamically updated.
[11]
11. Method, according to claim 9 or 10, CHARACTERIZED by the fact that said third data comprise a second information representative of a projection format used to obtain the geometry, parameters of said projection and a flag that indicates whether at least least one of the projection parameters is dynamically updated.
[12]
12. Method according to one of claims 9 to 11, CHARACTERIZED by the fact that said metadata comprises at least one of the following information:  information representative of at least one point of view associated with at least one projection used to obtain the said geometry and texture;  information representative of a package of 2D geometry patches, each geometry patch being associated with the projection of a part of said 3D scene;
 information representative of a 2D texture patch package, each texture patch being associated with the projection of a part of said 3D scene;  information representative of several 3D patches, each patch being associated with a part of the 3D scene and associated with an identifier on said second and said first video track or on said third video track.
[13]
13. Device CHARACTERIZED by the fact that it comprises a processor configured to:  decode, from a first video track of the container, the first data representative of the texture of said 3D scene visible from a first point of view; - decode, from at least a second video track of said container, second data representative of the geometry of said 3D scene visible from a set of points of view and from said first point of view;  decode, from a third video track of said container, third data representative of the texture of said 3D scene visible only from the points of view of said set; and  decoding metadata in said container, said metadata comprising information representative of at least one projection used to obtain said second and third data.
[14]
14. Device according to claim 13, CHARACTERIZED by the fact that said second data comprise a first information representative of a projection format used to obtain the geometry, parameters of said projection and a flag indicating whether at least one projection parameters is dynamically updated.
[15]
15. Device according to claim 13 or 14, CHARACTERIZED by the fact that said third data comprise a second information representative of a projection format used to obtain the geometry,
said projection parameters and a flag indicating whether at least one of the projection parameters is dynamically updated.
[16]
16. Device according to one of claims 13 to 15, CHARACTERIZED by the fact that said metadata comprises at least one of the following information:  information representative of at least one point of view associated with at least one projection used to obtain the said geometry and texture;  information representative of a package of 2D geometry patches, each geometry patch being associated with the projection of a part of said 3D scene;  information representative of a 2D texture patch package, each texture patch being associated with the projection of a part of said 3D scene;  information representative of several 3D patches, each patch being associated with a part of the 3D scene and associated with an identifier on said second and said first video track or on said third video track.
[17]
17. Bit flow CHARACTERIZED by the fact that it carries data representative of a 3D scene, and said data comprises, in a first video track of a container, the first data representative of the texture of said 3D scene visible from a first point of view; in at least a second video track of said container, second data representative of the geometry of said 3D scene visible from a set of points of view and from said first point of view; in a third video track of said container, third data representative of the texture of said 3D scene visible only from the points of view of said set; and metadata in said container, said metadata comprising information representative of at least one projection used to obtain said second and third data.
类似技术:
公开号 | 公开日 | 专利标题
BR112020007727A2|2020-10-13|method, apparatus and flow for volumetric video format
US20210400305A1|2021-12-23|Methods, devices and stream for encoding and decoding volumetric video
KR20200096575A|2020-08-12|Method and apparatus for encoding point clouds representing three-dimensional objects
EP3429210A1|2019-01-16|Methods, devices and stream for encoding and decoding volumetric video
EP3562159A1|2019-10-30|Method, apparatus and stream for volumetric video format
KR20200051784A|2020-05-13|Methods and devices for encoding and decoding 3 degrees of freedom and volumemetric compatible video streams
EP3457688A1|2019-03-20|Methods and devices for encoding and decoding three degrees of freedom and volumetric compatible video stream
US20210074029A1|2021-03-11|A method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
WO2019191202A1|2019-10-03|Method, apparatus and stream for volumetric video format
US20200344493A1|2020-10-29|A method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
US11178383B2|2021-11-16|Method, apparatus and stream for volumetric video format
KR20210027483A|2021-03-10|Methods and Devices for Encoding and Decoding 3 Degrees of Freedom and Volumetric Compatible Video Stream
EP3515067A1|2019-07-24|A method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
EP3709659A1|2020-09-16|A method and apparatus for encoding and decoding volumetric video
KR20220035229A|2022-03-21|Method and apparatus for delivering volumetric video content
WO2021015982A1|2021-01-28|A method and apparatus for delivering a volumetric video content
WO2021204700A1|2021-10-14|Different atlas packings for volumetric video
同族专利:
公开号 | 公开日
EP3698551A1|2020-08-26|
KR20200065076A|2020-06-08|
EP3474562A1|2019-04-24|
CN111434121A|2020-07-17|
US20210195162A1|2021-06-24|
WO2019079032A1|2019-04-25|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5877779A|1995-07-06|1999-03-02|Sun Microsystems, Inc.|Method and apparatus for efficient rendering of three-dimensional scenes|
WO2012132267A1|2011-03-31|2012-10-04|パナソニック株式会社|Omnidirectional stereo image output device|
WO2013030458A1|2011-08-31|2013-03-07|Nokia Corporation|Multiview video coding and decoding|
WO2013162259A1|2012-04-23|2013-10-31|삼성전자 주식회사|Multiview video encoding method and device, and multiview video decoding method and device|
US9826212B2|2013-05-10|2017-11-21|Koninklijke Philips N.V.|Method of encoding a video data signal for use with a multi-view rendering device|
CN108702528B|2016-02-17|2021-06-01|Lg电子株式会社|Method for transmitting 360 video, method for receiving 360 video, apparatus for transmitting 360 video, and apparatus for receiving 360 video|
US10945000B2|2016-02-22|2021-03-09|Sony Corporation|File generation apparatus and file generation method as well as reproduction apparatus and reproduction method|US11245926B2|2019-03-19|2022-02-08|Mediatek Singapore Pte. Ltd.|Methods and apparatus for track derivation for immersive media data tracks|
CN114009054A|2019-06-28|2022-02-01|索尼集团公司|Information processing apparatus, information processing method, reproduction processing apparatus, and reproduction processing method|
US11122102B2|2019-07-03|2021-09-14|Lg Electronics Inc.|Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method|
WO2021052799A1|2019-09-19|2021-03-25|Interdigital Ce Patent Holdings|Devices and methods for generating and rendering immersive video|
WO2021067503A1|2019-10-01|2021-04-08|Intel Corporation|Immersive video coding using object metadata|
WO2021063887A1|2019-10-02|2021-04-08|Interdigital Vc Holdings France, Sas|A method and apparatus for encoding, transmitting and decoding volumetric video|
EP3829166A1|2019-11-29|2021-06-02|InterDigital CE Patent Holdings|A method and apparatus for decoding a 3d video|
WO2021110940A1|2019-12-06|2021-06-10|Koninklijke Kpn N.V.|Encoding and decoding views on volumetric image data|
WO2021122881A1|2019-12-19|2021-06-24|Interdigital Vc Holdings France|Volumetric video with auxiliary patches|
KR102344072B1|2020-01-08|2021-12-28|엘지전자 주식회사|An apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data, and a method for receiving point cloud data|
WO2021141258A1|2020-01-09|2021-07-15|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021141208A1|2020-01-09|2021-07-15|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021141233A1|2020-01-10|2021-07-15|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021140276A1|2020-01-10|2021-07-15|Nokia Technologies Oy|Storage of multiple atlases from one v-pcc elementary stream in isobmff|
WO2021176133A1|2020-03-04|2021-09-10|Nokia Technologies Oy|An apparatus, a method and a computer program for volumetric video|
WO2021201386A1|2020-03-30|2021-10-07|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021202310A1|2020-03-31|2021-10-07|Intel Corporation|Methods and apparatus to signal enabled views per atlas in immersive video|
WO2021206333A1|2020-04-11|2021-10-14|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method|
US20220060529A1|2020-04-12|2022-02-24|Lg Electronics Inc.|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021210867A1|2020-04-12|2021-10-21|엘지전자 주식회사|Device for transmitting point cloud data, method for transmitting point cloud data, device for receiving point cloud data, and method for receiving point cloud data|
US20210409767A1|2020-06-19|2021-12-30|Lg Electronics Inc.|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
US20220006999A1|2020-06-21|2022-01-06|Lg Electronics Inc.|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2021261897A1|2020-06-23|2021-12-30|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method|
US20210409768A1|2020-06-24|2021-12-30|Samsung Electronics Co., Ltd.|Tiling for video based point cloud compression|
WO2022019713A1|2020-07-23|2022-01-27|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
WO2022035250A1|2020-08-12|2022-02-17|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|
CN114078191A|2020-08-18|2022-02-22|腾讯科技(深圳)有限公司|Data processing method, device, equipment and medium for point cloud media|
法律状态:
2021-11-23| B350| Update of information on the portal [chapter 15.35 patent gazette]|
2022-01-11| B11A| Dismissal acc. art.33 of ipl - examination not requested within 36 months of filing|
优先权:
申请号 | 申请日 | 专利标题
EP17306443.7|2017-10-20|
EP17306443.7A|EP3474562A1|2017-10-20|2017-10-20|Method, apparatus and stream for volumetric video format|
PCT/US2018/054110|WO2019079032A1|2017-10-20|2018-10-03|Method, apparatus and stream for volumetric video format|
[返回顶部]